18.05 Lecture 25 
April 11, 2005 



MELximum Likelihood Estimators 

Xi, ...,Xn have distribution Pe^ € {Pe : 6* € 9} 

Joint p.f. or p.d.f.: f{xi, ...,a;„) = J{xi\9) X ... X f{xn\6) = tpiO) - likelihood function. 

If Fe - discrete, then f{x\0) = Pe(X = x), 

and ipiO) - the probability to observe Xi, X„ 

Definition: A Maximum likelihood estimator (M.L.E.): 
6 = e{Xi, A„) such that V(^) = maxg V(6') 

Suppose that there are two possible values of the parameter, 9 = 1,6 = 2 
p.f./p.d.f. -/(x|l),/(x|2) 
Then observe points xi,...,Xn 

view probability with first parameter and second parameter: 

^(1) = f{xi,...,Xn\l) = 0.1,V'(2) = f{Xi,...,Xn\2) = 0.001, 

The parameter is much more likely to be 1 than 2. 

Example: Bernoulli Distribution B(p), p G [0.1], 

^{p) = fix,, ...,x„\p) = p^--(l - p)^^-^-^ 

ipiS) — > rnax ^ log '0(6') max (log-likelihood) 

\og'il){p) = Yli^i logP — Y^Xi) log(l — p), maximize over [0, 1] 

Find the critical point: 

d 

— log^(p) = 

p i-p 

^a;i(l -p) -p{n - ^x,) = ^a;^ -p^x, - np + p^a;, = 

p=L^=x^E{X)=p 
n 

For Bernoulli distribution, the MLE converges to the actual parameter of the distribution, p. 
Example: Normal Distribution: A^(/x,cr^), 



27ro- 
V27rfT 



1 " 

log V'(/U, cr^) = n log(v^fT) - ^ 51 ~ '"^^ ^ • 
Note that the two parameters are decoupled. 
First, for a fixed a, we minimize J2^=i{^i ~ A*)^ over /x 
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^ i=l j=l 
n 1 ^ 

Xi — n/j, = 0, fi = — = X ^ ]E(X) = /xo 



i=l i=l 

To summarize, the estimator of /j, for a Normal distribution is the sample mean. 
To find the estimator of the variance: 

1 " 

— nlog(\/27rcr) — ^— ^ ^^(^i — x)"^—* maximize over a 

i=l 

d n 1 ,2 
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i=l 



(T^ = — ^^{xi — x)^- MLE of o-q; ct^ — a sample variance 



Find 



ft ft ft ft 

To summarize, the estimator of CTq for a Normal distribution is the sample variance. 
Example: U{0, 9),9 > - parameter. 

f{x\e) = {^,0 < a; < 6';0, otherwise } 

Here, when finding the maximum we need to take into account that the distribution is supported on a 
finite interval [0,^]. 

" 1 1 

= n <Xi<B)^ — /(O < XI, X2, ...,Xn < 9) 

The likelihood function will bo if any points fall outside of the interval. 
If 9 will be the correct parameter with P = 0, 
you chose the wrong 6 for your distribution. 

'4j{9) —> maximize over 9 > Q 
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If you graph the p.d.f., notice that it drops off when 6 drops below the maximum data point. 

6 = max(Xi, ...,X„) 

The estimator converges to the actual parameter 9o- 

As you keep choosing points, the maximum gets closer and closer to 

Sketch of the consisteny of MLE. 

tjj{0) max — log ip {9) — > max 



1 1 1 

Lr^ie) = -logvw = -logU f{xi\e) = -Y^iogfixiie) ^ L{9) =E8, \ogfixi\e). 




Ln{0) is maximized at 9, by definition of MLE. Let us show that L{9) is maximized at ^o- 
Then, evidently, 9 9^. L{0) < L{9o) : 
Expand the inequality: 



m - m) = / log j^^moo)dx < J (-« - 1) /(.i.o)^. 

= J {f{x\9) - f{x\9o)) dx = 1-1 = 0. 




Here, we used that the graph of the logarithm will be less than the line y = x - 1 except at the tangent 
point. 



End of Lecture 25 
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